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Description 

[0001] The present invention relates to a method of 
automatic text processing. In particular, the present in- 
vention relates to an automatic method of generating 
thematic summaries of documents. 
[0002] Document summaries and abstracts serve a 
valuable function by reducing the time required to review 
documents. Summaries and abstracts can be generat- 
ed after document creation either manually or automat- 
ically. Manual summaries and abstracts can be of high 
quality but may be expensive because of the human la- 
bor required. Alternatively, summaries and abstracts 
can be generated automatically. Automatic summaries 
and abstracts can be cheaper to produce, but obtaining 
high quality consistently is difficult 
[0003] Systems for generating automatic summaries 
rely upon one of two computational techniques, natural 
language processing or quantitative content analysis. 
Natural language processing is computationally inten- 
sive. Additionally, producing semantically con-ect sum- 
maries and abstracts is difficult using natural language 
processing when document content is not limited. 
[0004] Quantitative content analysis relies upon sta- 
tistical properties of text to produce summaries. Gerald 
Salton discusses the use of quantitative content analy- 
sis to summarize documents in "Automatic Text 
Processing" (1989). The Salton summarizer first iso- 
lates text words within a corpus of documents. Next, the 
Salton summarizer flags as title words words used in 
titles, figures, captions, and footnotes. Afterward, the 
frequency of occun-ence of the remaining text words 
within the document corpus is determined. The frequen- 
cy of occun-ence and the location of text words are then 
used to generate word weights. The Salton summarizer 
uses the word weights to score each sentence of each 
document in the document corpus. These sentence 
scores are used In turn to produce a summary of a pre- 
determined length for each document In the document 
corpus. Summaries produced by the Salton summarizer 
may not accurately reflect the themes of individual doc- 
uments because word weights are determined based 
upon their occun'ence across the document corpus, 
rather than within each individual document. 
[0005] Edmundson. H. P.: "New Methods In Automat- 
ic Extracting" Joumal of the Association For Computing 
Machinery, vol. 16, April 1969, pages 264-285, 
XP002078269 discloses methods of automatically ex- 
tracting documents for screening purposes, i.e. the 
computer selection of sentences having the greatest po- 
tential for conveying to the reader the substance of the 
document. The described methods treat in addition to 
the presence of high-frequency content words (key- 
words) three components: pragmatic words (cue 
words); title and heading words; and structural indica- 
tors (sentence location). 

[0006] US-A-5 384 703 describes automatically f oma- 
ing a summary by selecting regions of a document. Each 



selected region includes at least two members of a seed 
list. The seed list is fomied from a predetemiined 
number of the most frequently occun-ing complex ex- 
pressions In the document that are not on a stop list. If 
5 the summary is too long, the region selection process Is 
performed on the summary to produce a shorter sum- 
mary. This region-selection process is repeated untii a 
summary is produced having a desired length. Each 
time the region selection process is repeated, the seed 
list members are added to the stop list and the complex- 
ity level used to identify frequently occurring expres- 
sions is reduced. 

[0007] "Method for Automatic Extraction Of Relevant 
Sentences From Texts", IBM Technical Disclosure Bul- 

15 letin, vol. 33, no 6A, November 1990, page 338-339, 
XP00201 5802 describes a method for automatically ex- 
tracting from a text in any language the most significant 
and explicative sentences. All the words of the text are 
analyzed and listed according to an order of importance. 

20 Only useful words are considered excluding "stop 
words" such as articles, pronouns, prepositions, etc. 
The importance of selected words is evaluated accord- 
ing to their frequency and position within the text. The 
words with high-frequency and located in the title, intro- 

25 duction or conclusion are considered of increased im- 
portance. 

[0008] Black W. J. et al.: "A Practical Evaluation Of 
Two Rule-Based Automatic Abstracting Techniques", 
Expert Systems For Infomiation Management, vol. 1 , no 

30 3.1 988, pages 1 59-1 77. XP00201 5761 describes auto- 
matic abstracting techniques based on statistical ana- 
lyzes and superficial syntactic pattern matching of texts, 
which lend themselves to implementation using a rule- 
based approach. 

35 [0009] Luhn. H. P.: "The Automatic Creation Of Liter- 
ature Abstracts", IBM Journal, April 1958, pages 
159-165, XP00207827a describes a method for auto- 
matically creating an abstract from an article presented 
in machine-readable torn to a data-processing mar 

40 chine, whereby statistical infomnation is derived from 
wordf requency and distribution. This statistical infomna- 
tion is used by the machine to compute a relative meas- 
ure of significance, frrst for individual words and then for 
sentences. Sentences scoring highest in significance 

^ are extracted and printed out to become the "auto-ab- 
stract". 

[0010] An object of the present invention is to auto- 
matically generate improved document summaries that 
accurately reflect the themes of each document. 

50 [0011] According to the present invention, there is 
provided a processor implemented method of generat- 
ing a thematic summary of a document presented in ma- 
chine-readable fomi to the processor, the document in- 
cluding a first multiplicity of sentences and a second 

55 multiplicity of temns, the processor implementing the 
method by executing instructions stored in electronic 
fomi in a memory device coupled to the processor, the 
processor implemented method comprising the steps 
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of: a) determining a value of a first number of thematic 
temfis based upon a value of a second number repre- 
senting a length of the thematic summary, the first 
number being less than the second number; b) selecting 
the first number of thematic terms from the second mul- s 
tiplicity of terms; c) scoring each sentence of the first 
muitiplicity of sentences based upon occurrence of the- 
matic terms in each sentence; and d) selecting the sec- 
ond number of thematic sentences from the first multi- 
plicity of sentences based upon the score of each sen- 
tence. 

[001 2] The method of the invention automatically pro- 
duces readable and semantically correct document 
summaries. The method requires a user, at most, to 
specify the length of the desired summary. Document 
summaries may be automatically generated without us- 
ing an iterative approach. 

[0013] A technique for automatically generating the- 
matic summaries of machine readable documents will 
be described. The technique begins with identification 
of thematic temns within the document. Next, each sen- 
tence of the document is scored based upon the number 
of thematic temis contained within the sentence. After- 
ward, the highest scoring sentences are selected as the- 
matic sentences. The present invention will now be de- 
scribed, by way of example with reference to the accom- 
panying drawings, in which: 

Figure 1 illustrates a computer system for automat- 
ically generating thematic summaries of docu- 
ments, and 

Figure 2 is a flow diagram of a method of generating 
a thematic summary of a document using the com- 
puter system of Figure 1 . Figure 1 illustrates in block 
diagram fonn computer system 1 0 in which the 
present method is implemented. The present meth- 
od alters the operation of computer system 10, al- 
lowing it to generate a thematic summary of any 
document presented in machine readable form. 
Briefly described, computer system 10 generates a 
thematic summary by identifying thematic tenns 
within the document and then scoring each sen- 
tence of the document based upon the number of 
thematic tenns contained within the sentence. Af- 
ten«vard, computer system 1 0 selects the highest 
scoring sentences as thematic sentences and 
presents those sentences to a user of computer 
system 1 0. 

[0014] Prior to a more detailed discussion of the 
present method, consider computer system 10. Compu- 
ter system 1 0 includes monitor 1 2 for visually displaying 
Infomnation to a computer user. Computer system 1 0 al- 
so outputs infomriation to the computer user via printer 
13. Computer system 10 provides the computer user 
multiple avenues to input data. Keyboard 14 allows the 
computer user to Input data to computer system 10 by 
typing. By moving mouse 16 the computer user is able 



to move a pointer displayed on monitor 1 2. The compu- 
ter user may also input information to computer system 
1 0 by writing on electronic tablet 1 8 with a stylus or pen 
20. Alternatively, the computer user can input data 
stored on a magnetic medium, such as a floppy disk, by 
inserting the disk into floppy disk drive 22. Optical char- 
acter recognition unit (OCR unit) 24 pemiits the compu- 
ter user to input hardcopy documents 26 into the com- 
puter system, which OCR unit 24 then converts into a 
coded electronic representation, typically American Na- 
tional Standard Code for Infonmation Interchange {AS- 
Cll). 

[001 5] Processor 1 1 controls and coordinates the op- 
erations of computer system 10 to execute the com- 
mands of the computer user. Processor 11 detemiines 
and takes the appropriate action in response to each 
user command by executing instructions stored elec- 
tronically in memory, either memory 28 or on a floppy 
disk within disk drive 22. Typically operating instructions 
for processor 11 are stored in solid state memory 28, 
allowing frequent and rapid access to the instructions. 
Semiconductor memory devices that can be used in- 
clude read only memories (ROM), random access mem- 
ories (RAM), dynamic random access memories 
(DRAM), programmable read only memories (PROM), 
erasable programmable read only memories (EPROM), 
and electrically erasable programmable read only mem- 
ories (EEPROM), such as flash memories. 
[001 6] Figure 2 illustrates In flow diagram fonm the in- 
structions 40 executed by processor 11 to generate a 
thematic summary of a machine readable document. In- 
structions 40 may be stored in solid state memory 28 or 
on a floppy disk placed within floppy disk drive 22. In- 
structions 40 may be realized in any computer lan- 
guage, including LISP and C + +. 
[0017] Initiating execution of instructions 40 requires 
selection and input of a document in electronic form. If 
desired, priorto initiating execution of instructions 40 the 
computer user may also changethe length, denoted "S", 
of the thematto summary from the default length. The 
default length of the thematic summary may be set to 
any ariDitrary number of sentences. In an embodiment 
intended for document browsing, the default length of 
the thematic summary is set to five sentences. 
[0018] Processor 11 responds to selection of a docu- 
ment to be summarized by branching to step 42. During 
step 42 processor 11 tokenizes the selected document 
into words and sentences. That is to say, processor 11 
analyzes the machine readable representation of the 
selected docuhient and identifies sentence boundaries 
and the words within each sentence. 
[0019] Tokenizatiorl of natural language text is well 
known and therefore will not be described in detail here- 
in. Additionally, during tokentzation, processor 11 as- 
signs a sentence I.D. to each sentence of the document. 
In one embodiment, each sentence is identified by a 
nunriber representing its location with respect to the start 
of document. Other methods of identifying the sentenc- 
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es may be used without affecting the present method. 
After tokenizing the selected document, processor 11 
branches from step 42 to step 44. 
[0020] Processor 1 1 examines each word token of the 
document during step 44 and compares the word to the 
terms already included in a term list. If the word token 
is not yet included on the list, then processor 11 adds 
the word to the term list and notes the sentence I.D. of 
the sentence in which the word occurs. On the other 
hand, if the word is already on the temi list, processor 
1 1 simply adds the sentence I.D. for that word token to 
the entry, orlist, forthatterm. In other words, during step 
44 processor 11 generates a data structure associating 
words of the document with the location of every occur- 
rence of that term. Thus, for example, a tenm list entry 
of "apostasy, 7, 9, 12" indicates that the temi "apostasy" 
occurs in sentences 7, 9, and 12 of the document. 
[0021 ] Preferably, while generating the term list, proc- 
essor 11 filters out stop words. As used herein, "stop 
words" are words that do not convey thematic meaning 
and occur very frequently in natural language text. Most 
pronouns, prepositions, detenminers, and *1o be" verbs 
are classified as stop words. Thus, for example, words 
such as "and, a, the, on, by, about, he, she" are stop 
words. Stop words within the document are identified by 
comparing the word tokens for the document to a list of 
stop words. Eliminating stop words from the tenn list is 
not necessary, but doing so reduces the total processing 
time required to generate a thematic summary of a doc- 
ument. 

[0022] Processor 1 1 branches to step 46 from step 44 
after completing the terni list. During step 46 processor 
11 analyzes the tenn list to determine the number of 
times each tenn occurs in the document. This is done 
simply by counting the number of sentence I.D.s asso- 
ciated with the temri. That done, processor 11 branches 
to step 50. 

[0023] After Initiation of execution and prior to execu- 
tion of step 60, during step 48, processor 11 detenmlnes 
the number of thematic terms to be used in selecting 
thematic sentences. That number, denoted "K", is de- 
temriined based upon the length of the thematic summa- 
ry; i.e., based upon S. In general, K should be less than 
Sand greater than 1 . Requiring Kbe less than S insures 
some commonality of theme between selected thematic 
sentences. Preferably, Kis detemriined according to the 
equation: 



^ r 5x c\ Sx C|> 3 
\ 3 Otherwise; 

where: 

c^ is a constant whose value is less than 1 ; 

S is the number of sentences in the thematic sum- 



mary; and 

K is the number of thematic temis. 

In one embodiment, the value of c^ is set equal to 0.7. 

5 [0024] Armed with a value for K and the terni counts 
generated during step 46, processor 1 1 begins the proc- 
ess of selecting Kthematic terms. During step 50, proc- 
essor 1 1 sorts the terms of the temn list according to their 
counts; i.e., the total number of occurrences of each 

10 term In the document. Ties between two tenns having 
the same count are broken in favor of the term including 
the greatest number of characters. Having generated a 
sorted term list and stored the list in memory, processor 
11 branches from step 50 to step 52. During step 52 

IS processor selects from the sorted term list the /C temis 
with the highest counts. That done, processor 11 ad- 
vances to step 54. 

[0025] During step 54 processor 1 1 computes the to- 
tal number of occurrences of the K thematic tenns in the 

20 document. That number, denoted "A^', is calculated by 
summing the counts of the fC thematic tenns. Processor 
11 branches to step 56 from step 54. 
[0026] Having selected the thematic tenns and deter- 
mined their counts, processor 11 is ready to begin eval- 

25 uatlng the thematic content of the sentences of the doc- 
ument. During steps 56, 58, 60, and 62, processor 11 
considers only those sentences that include at least one 
of the K thematic terms. Processor 1 1 does so by exam- 
ining the K highest scoring terms of the sorted temn list 

50 After selecting a tenn, denoted t^, during step 56, proc- 
essor 11 examines each sentence I.D, associated with 
during step 58. For each sentence I.D. associated with 
tg processor 11 increments that sentence's score. Pref- 
erably, the score for each sentence Is Incremented by 

35 s, where s is expressed by the equation: 

s = count^[C2 + freq^ ]; 

^0 where: 

countjg is the number of occurences of^^ in the sen- 
tence 

C2 is a constant having a non-zero, positive value; 
45 and 

freqf^ Is the frequency of the selected temri t^. 
freq^^ is given by the expression: 
freq^ = count^^/N; 

so where: 

N represents the total number of occun-ences of 
thematic tenns within the document. 

55 Preferably, Cg is set to a value of one. 

[0027] Sentence scores can be tracked by generating 
a sentence score list during step 58. Each time proces- 
sor 11 selects a sentence I.D. the sentence score list Is 
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examined to see if it includes that sentence I.D.. If not, 
the sentence I.D. is added to the sentence score list and 
its score is increased as appropriate. On the other hand, 
if the sentence score list already includes the particular 
sentence I.D., then the score already associated with 
the sentence is Incrennented in the manner discussed 
previously. 

[0028] After incrementing the scores of ail sentences 
associated with the selected term, tg, processor 11 
branches from step 58 to step 60. During step 60 proc- 
essor 11 determines whether all the thematic terms 
have been evaluated. If not, processor 11 returns to step 
56 to select another thematic tenn as the selected term. 
Processor 11 branches through steps 56, 58, and 60 as 
described previously until all of the thematic terms have 
been examined. When that event occurs, processor 11 
branches to step 62 from step 60. 
[0029] During step 62 processor 11 selects as the the- 
matic summary the Ssentences with the highest scores. 
Processor 11 does this by sorting the sentence score 
list by score. Having selected the thematic sentences, 
processor 11 may present the thematic summary to the 
user via monitor 12 or printer 13, as well as storing the 
thematic summary in memory 22 or to floppy disk for 
later use. The sentences of the thematic summary are 
preferably presented in their order of occurrence within 
the document. While the sentences may be presented 
in paragraph fomn, presentation of each sentence indi- 
vidually is preferable because the sentences may not 
logically iom a paragraph. Generation of the thematic, 
summary complete, processor 11 branches to step 64 
from step 62. 

[0030] Thus, a method of automatically generating 
thematic summaries for documents has been de- 
scribed. The method relies upon quantitative content 
analysis to identify thematic words, which are used in 
turn to identify thematic sentences. Appendix A and Ap- 
pendix B include summaries generated using this meth- 
od to automatically generate thematic summaries. 

Appendix A: Summary of Shevafdnadze's 
Resignation Speech 

[0031] I have drawn up the text of such a speech, and 
I gave it to the secretariat, and the deputies can acquaint 
themselves with it - what has been done is the sphere 
of current policy by the country's leadership, by the 
President and by the ministry of Foreign Affairs, and how 
the curent conditions are shaping up for the develop- 
ment of the country, for the implementation of the plans 
for our democratization and renewal of the country, for 
economic development and so on. 
[0032] Yesterday there were speeches by some com- 
rades ~ they are our veterans ~ who raised the question 
of the need for a declaration to be adopted forbidding 
the President and the country's leadership from sending 
troops to the Persian Gulf. And these speeches yester- 
day, comrades, filled the cup of patience, to overflowing. 



[0033] On about 1 0 occasions, both in the country and 
abroad, I have had to speak and explain the attitude of 
the Soviet Union toward this conflict. 
[0034] In that case we would have had to strike 
5 through everything that has been done in recent years 
by all of us, by the whole country and by all of our people 
in the field of asserting the principles of the new political 
thinking. 

[0035] Second, I have explained repeatedly, and 
10 Mikhail Sergeyevlch spoke of this in his speech at the 
Supreme Soviet, that the Soviet leadership does not 
have any plans - I do no know, maybe someone else 
has some plans, some group - but official bodies, the 
Ministry of Defense charges are made that the Foreign 
IS Minister plans to land troops in the Persian Gulf, in the 
region. 

[0036] The third issue, I said there, and I confinn it 
and state It publicly, that if the interests of the Soviet 
people are encroached upon, if just one person suffers 
20 - wherever it may happen, in any country, not just in 
Iraq but in any other country ~ yes, the Soviet Govem- 
ment, the Soviet side will stand up for the interests of its 
citizens. 

[0037] I say that, all the same, this is not a random 
2s event. Excuse me, I am now going to recall the session 
of the supreme soviet. On comrade Lukyanov's initia- 
tive, literally just before the start of a meeting, a serious 
matter was Included on the agenda about the treaties 
with the gemian democratic republic. 
30 [0038] I cannot reconcile myself with what is happen- 
ing in my country and to the trials which await our peo- 
ple. 

Appendix B: Summary of "Research that Reinvents 
35 the Corporation" by John Seely Brown 

[0039] As companies try to keep pace with rapid 
changes in technology and cope with increasingly un- 
stable business environments, the research department 

^0 has to do more than simply innovate new products. 
[0040] Over the next decade, PARC researchers 
were responsible for some of the basic innovations of 
the personal-computer revolution-only to see other 
companies commercialize these innovations more 

45 quickly than Xerox. 

[0041] One popular answer to these questions is to 
shift the focus of the research department away from 
radical breakthroughs toward incremental innovation, 
away from basic research toward applied research. 

50 [0042] Our emphasis on pioneering research led us 
to redefine what we mean by technology, by innovation, 
and indeed by research itself. 
[0043] Such activities are essential for companies to 
exploit successfully the next great breakthrough In in- 

55 fonmation technology: "ubiquitous computing," or the in- 
corporation of infomiation technology in a broad range 
of everyday objects. 

[0044] When corporate research begins to focus on a 
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company's practice as well as Its products, another prin- 
ciple quickly becomes clear: innovation isnt the privi- 
leged activity of the research department. At PARC, we 
are studying this process of local Innovation with em- 
ployees on the front lines of Xerox's business and de- 5 
veloping technologies to harvest its lessons for the com- 
pany as a whole. 

[0045] The result: important contributions to Xerox's 
core products but also a distinctive approach to innova- 
tions with implications far beyond our company. io 



Claims 

1. A processor Implemented method of generating a is 
thematic summary of a document (26) presented In 
machine-readable fomn to the processor (11), the 
document (26) including a first multiplicity of sen- 
tences and a second multiplicity of terms, the proc- 
essor (11) implementing the method by executing 20 
instructions stored in electronic fomn in a memory 
device (28) coupled to the processor (11), the proc- 
essor implemented method comprising the steps of: 

a) determining (48) a value of a first number of 25 
thematic terms based upon a value of a second 
number representing a length of the thematic 
summary, the first number being less than the 
second number; 

30 

b) selecting (52) the first number of thematic 
tenms from the second multiplicity of tenns; 

c) scoring (58) each sentence of the first multi- 
plicity of sentences based upon occun^ence 35 
(54) of thematic temis in each sentence; and 

d) selecting (62) the second number of thematic 
sentences from the first multiplicity of sentenc- 
es based upon the score of each sentence. 40 

2. The processor implemented method of claim 1 
wherein step b) comprises: 

i) determining (46) a number of times each term . 45 
of the second multiplicity of terms occurs in the 
document, and 

ii) selecting (52) the first number of thematic 
temns from the second multiplicity of terms so 
based upon the number of times (50) each term 
occurs in the document. 

3. The processor implemented method of claim 1 or 
claim 2 further comprising the step of: ss 

e) presenting the thematic sentences to a user 
of the processor (11) in an order in which the 



thematic sentences occur in the document. 

4. The processor Implemented method of any one of 
claims 1 to 3 wherein step c) comprises increment- 
ing (58) the score of each sentence for each the- 
matic terni occurring in the sentence by an amount 
related to the frequency of occurrence of the the- 
matic temi within the document. 

5. The processor implemented method of any one of 
claims 1 to 4 comprising, prior to step a), the step 
of receiving the value of the second number from 
an input device (1 4) coupled to the processor (11). 

6. The processor implemented method of any one of 
claims 1 to 5 wherein the first number Is at least 
three. 



PatentansprQche 

1. Prozessorimplementiertes Verfahren zum Erzeu- 
gen einer themattschen Zusammenfassung eines 
Dokuments (26), das dem Prozessor (11) in ma- 
schlnenlesbarer Fomi pr§sentiert wird, wobel das 
Dokument (26) eine erste Vielzahl von Satzen und 
eine zweite Vielzahl von Temien umfasst und der 
Prozessor (11 ) das Verfahren durch Ausfuhren von 
Instruktionen implementiert. die in elektronischer 
Fomi In einer Spelchereinrichtung (28), die mit dem 
Prozessor (11) gekoppelt ist, gespeichert sind, wo- 
bei das prozessorimplementierte Verfahren die fol- 
genden Schritte aufwelst: 

a) Bestimmen (48) eines Wertes einer ersten 
Anzahl von thematischen Termen bcruhend auf 
einem Wert einer zweiten Anzahl, die eine Lan- 
ge der thematischen Zusammenfassung repra- 
sentiert, wobel die erste Anzahl kleiner als die 
zweite Anzahl ist; 

b) Auswahlen (52) der ersten Anzahl von the- 
matischen Ternien von der zweiten Vielzahl 

von Termen; 

c) Bewerten (58) jedes Satzes der ersten Viel- 
zahl von Satzen beruhend auf eines Auftretens 
(54) von thematischen Termen in jedem Satz; 
und 

d) Auswahlen (62) der zweiten Anzahl von the- 
matischen satzen von der ersten Vielzahl von 
Satzen beruhend auf der Bewertung jedes Sat- 
zes. 

2. Prozessorimplementiertes Verfahren nach An- 
spruch 1 , wobel Schritt b) aufweist: 
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I) Bestimmen (46) einer Haufigkeit, mit der je- 
der Term der zweiten Vielzahl von Termen in 
dem Dokument auftritt, und 

ii) Auswahlen (52) der ersten Anzahl von the- s 
matischen Termen von der zwerten Vielzahl 
von Temnen beruhend auf der Haufigkeit (60), 
mit der jeder Temn In dem Dokument auttrltt. 

3. Prozessorlmplementiertes Verfahren nach An- io 
spruch 1 Oder 2, des Welteren aufweisend den 

Schritl: 

e) Prasentieren derthematlschen Satze zu ei- 
nem Benutzerdes Prozessors (11) in einer Rei- is 
henfolge, in der die thematischen Satze in dem 
Dokument auftreten. 

4. Prozessorimplementiertes Verfahren nach einem 

der Anspruche 1 bis 3, wobei Schritt c) ein Inkre- 20 
mentleren (58) der Bewertung von jedem Satz um- 
fasst, fur jeden thematischen Temn, der in dem Satz 
auftritt, urn einen Betrag, der sich auf die Haufigkeit 
des Auftretens des thematischen Terms Innerhalb 
des Dokuments bezieht. 25 

5. Prozessorimplementiertes Verfahren nach einem 
der Anspruche 1 bis 4, umfassend, vor Schritt a), 
den Schritt des Empfangens des Werts der zweiten 
Anzahl von einer Eingabeeinrichtung (14), die mit so 
dem Prozessor (11) gekoppelt ist. 

6. Prozessorimplementiertes Verfahren nach einem 
der Anspruche 1 bis 5, wobei die erste Anzahl min- 
destens 3 ist. 35 



b) de selection (52) du premier nombre de ter- 
mes th^matiques k partir de la seconde variete 
de tennes ; 

c) d'6vaiuation (58) de chaque phrase de la pre- 
miere variete de phrases sur la base de I'appa- 
rition (54) de termes th^matiques dans chaque 
phrase ; et 

d) de selection (62) du second nombre de phra- 
ses th6matiques a partir de la premiere variety 
de phrases sur la base de revaluation de cha- 
que phrase. 

2. Le proc6d6 mis en oeuvre par un processeur selon 
la revehdication 1, dans lequel I'^tape b) 
comprend : 

I) la d6temiination (46) d'un nombre de fois que 
chaque temie de la seconde variete de termes 
apparaTt dans le document, et 

ii) la selection (52) du premier nombre de zer- 
mes thdmatiques k partir de la seconde variete 
de tennes sur la base du nombre de fois (50) 
que chaque terme apparaTt dans le document. 

3. Le precede mis en oeuvre par un processeur selon 
la revendication 1 ou la revendication 2 comprenant 
en outre I'^tape de : 

e) presentation des phrases th^matiques k un 
utilisateur du processeur (11) dans un ordre 
dans lequei les phrases th^matiques apparais- 
sent dans le document. 



Revendications 

1 . Proc6d6 mis en oeuvre par un processeur de cr6a- 40 
tlon d'un resume thematique d'un document (26) 
presents sous fomrie pouvant §tre lu par une ma- 
chine au processeur (1 1 ) Je document (26) compre- 
nant une premiere variete de phrases et une secon- 
de variete de termes, le processeur (11 ) mettant en- 45 
oeuvre le precede en executant les instructions 
stockees sous fomne eiectronique dans un dispositif 
k memoire (28) accoupie au processeur (1 1 ). ie pro- 
cede mis en oeuvre par un processeur comprenant 
les etapes : so 

a) de detenmination (48) d'une valeur d*un pre- 
mier nombre de termes thematiques sur la ba- 
se d'une valeur d'un second nombre represen- 
tant une longueur du resume thematique, le ss 
premier nombre etant inf6rieur au second 
nombre ; 



4. Le precede mis en oeuvre par un processeur selon 
Tune quelconque des revendications 1 k 3, dans le- 
quel retape c) comprend Tincrementatlon (58) de 
revaluation de chaque phrase pour chaque terme 
thematique apparaissant dans la phrase d'une 
quantite relative a la frequence d'apparition du ter- 
me thematique dans le document. 

5. Le precede mis en oeuvre par un processeur selon 
I'une quelconque des revendications 1 k 4, compre- 
nant, avant I'etape a), retape de reception de la va- 
leur du second nombre a partir d'un dispositif d'en- 
tree (14) accoupie au processeur (11). 

6. Le precede mis en oeuvre par un processeur selon 
I'une quelconque des revendications 1 k 5, dans le- 
quel le premier nombre est au molns trols. 
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